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A CASSETTE TO ACCUMULATE MULTIPLE PROTEINS THPOTir,u 
SYNTHESIS OF A SELF-PR OCESSINC POT VPirPTmr 

This invention was made with government support under Grant Nos. ROl-AI 27161- 
05 A 1 from the National Institutes of Health. The government has certain rights in this 
invention. 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

This invention relates to methods for plant transformation to enhance and control gene 
expression. More particularly, this invention relates to a method for expressing more than 
one transgenic gene in planes in equimolar amounts 
from a single promoter. 

2. Description of Related Art 

In recent years, development of plant transformation techniques and strategies for 
enhancing and controlling gene expression have broadened the practical applications of 
plant biotechnology. However, the potential of all these techniques must deal with the 
problems encountered when more than one transgene is expressed in planta. 

Current approaches to expressing more than one gene in transgenic plants require the use 
of multiple promoters, which in itself presents problems related to levels of expression 
from each promoter. For example, the relative levels of expression in potato plants of two 
genes encoding two viral coat proteins (CP), which were introduced via a single Ti- 
derived transformation vector, were different in different plant lines (C. Lawson, et aL, 
Bio/Technology, &: 127- 134, 1990). In an alternative approach, plants are retransformed 
with a second gene, but this technique may induce gene silencing effects (M. Matzke, et 



viruses is desirable. Systems which allow equimolar accumulation of two or more 
1 0 proteins under the control of a single transcriptional promoter, would avoid the problems 
outlined above, while providing the additional advantages of producing equal amounts 
of the two transgenes in each plant. 

I 

Several plant and animal viruses encode proteinases that cleave viral polypeptides ^ 
yielding mature proteins. For instance, plant potyviral genomes are expressed through 

1 5 the translation of a single polypeptide which is processed to release multiple individual 
viral proteins (J. Riechmann, et al, J. Gen Virol, 22:1-16, 1992). Tkee viral proteinase 
activities have been implicated in this processing (J. Carrington, et al, EMBO 2:1 347- 
1353, 1990; J. Verchot, et al, Virology, iM:527-535, 1991). One of these, associated 
with the nuclear inclusion (NIa) protein, has been widely studied in the case of tobacco 

2 0 etch potyvirus (TEV) (J. Carrington, et al, 1 Virol, 62:2313-2320, 1988; J. Carrington, 
et al, J. Virol, 61:2540-2548, 1987), and is responsible for several processing events 
involving the large viral polypeptide. NIa from TEV functions during post-translational 
processing through the recognition and cleavage of a specific heptapeptide (J. Carrington, 
et al, Proc. Nat. Acad. Set USA, £5:3391-3395, 1988; W. Dougherty, et al, EMBO J., 

25 2:1281-1287, 1988). Taking advantage of this well-characterized proteinase activity, an 
expression cassette based on the TEV-NIa protein has been developed. This cassette 
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vector allows the synthesis of two or more proteins in equimolar amounts as part of a 
polyprotein that is cleaved into individual mature proteins by the NIa proteolytic activity. 
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SUMMARY OF THE INVENTION 

A cassette expression vector based on the nuclear inclusion (NIa) protease from tobacco 
etch virus (TEV) allows the transcription and translation of a nucleotide sequence 
comprising the TEV NIa coding region flanked on each side by its heptapeptide cleavage 
5 sequences and insertion sites for in frame insertion of two different open reading frames 
coding for heterologous proteins. Upon translation, of the resulting polypeptide the 
protease releases the two heterologous proteins in equimolar amounts by autoproteolytic 
reaction. Therefore, the invention provides a method for obtaining equimolar amounts 
of different proteins expressed under the control of a common promoter. Alternatively, 
10 a plurality of insertions sites can be engineered into a cassette containing a single TEV 
NIa protease gene for production of a plurality of peptides. In vitro or in vivo, the 
expression cassette functions to express genes encoding two or more different 
heterogeneous peptides from a single polypeptide by post translational self-cleavage by 
the NIa protease. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 A is a schematic diagram of a TEV a-based expression cassette vector pPROl . 
The open box represents the NIa open reading frame. The shaded areas enlarged above 
show (as both nucleotide and amino acid sequence) the heptapeptide recognition sequence 
for the NIa proteolytic activity at both N- and C-termini of NIa; the engineered Sma I and 
Stu / cloning sites (underlined) for the in frame introduction of different genes; and start 
ATG and stop TGA codons. The NIa processing site between Gin and Gly is indicated 
as an open arrowhead. The sequence of the TEV 5* non-translated region is also indicated 
by a black arrow upstream of the NIa coding sequence. Relevant unique restriction 
enzyme sites are indicated: Ba (BamHT), Bg (BgIU\ Ec (EcoR I), Sa (Sail), Sc (Sac I), 
Sm (Sma I), and St (Stu I). 

Figure IB is a detailed restriction map of pPROl displaying the nucleotide sequence and 
the amino acid sequence of the NIa protease (SEQUENCE I.D. NO. 6). 

Figure 1C is a schematic diagram showing amino acid additions that result at N- and C- 
termini of proteins cloned at the Sma I or Stu I enzyme restriction insertion sites of 
expression vector pPROl upon translation and subsequent proteolytic processing. The 
amino acid represented by X depends upon the particular restriction site used for cloning 
and can be coincident with amino acids in the cloned proteins in some cases. 
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Figure 2 shows an autoradiograph of an SDS-PAGE gel indicating the results of in vitro 
translation of RNA transcribed from the pPROl expression cassette. Translation 
reactions were programmed with 1 |ig of brome mosaic virus (BMV) RNAs (lane B), 
with no RNA added (lane 0), and with RNA transcribed in vitro from pPROl (lane 1). 
5 The molecular mass (in kDa), positions of the major proteins translated from BMV 
RNAs, and the position of the 49 kDa TEV NIa protein are indicated. 

Figure 3 A shows a schematic representation of six different polypeptides translated 
transcribed in vitro from different pPROl -derived constructs containing the TMV CP 
sequence. Open boxes represent the TEV-NIa sequence. Striped boxes represent the 
1 0 TMV CP sequence contained in the insertion site. The names of the constructs and the 
expected molecular mass of the translated and processed products are indicated. Q/G 
indicates the amino acid residues at the cleavage sequence in constructs cloned in pPROl ; 
whereas H/G indicates the His to Gin mutation at -1 position that inhibits processing by 
NIa in constructs cloned in pPR04. 

1 5 Figure 3B shows an autoradiograph of an SDS-PAGE gel containing in vitro translation 
products obtained from the constructs shown in Figure 3A. 

The vertical axis and lane assignments are the same as described for Figure 3C below. 
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Figure 3C shows fluorographs of immunoprecipitation analyses using anti-TMV CP 
antibody with aliquots from the translation samples shown in Figure 3B. In Figures 3B 
and 3C, translation reactions were programmed with no RNA added (lane 0); with RNA 
transcribed in vitro from pPROl (lane 1); pPROl.NT (lane 2); pPROl.TN (lane 3); 
5 pPROl .TaN (lane 4); pPR04.NT (lane 5); and pPR04.1TSf (lane 6). The molecular mass 
(in kDa) and positions of M C-labeled protein markers are indicated. T= TMV coat protein; 
N = NIa protease 

Figure 4 shows the results of in vitro translation of RNAs transcribed from pPROl 
constructs containing TMV CP and SMV CP coding sequences inserted at two sites in the 
1 0 cassette. 

Figure 4A is a schematic diagram representing the vectors pPRO 1 .SNT and pPRO 1 .TNS. 
The open box represents the TEV-NIa sequence. Striped and dotted boxes represent 
TMV CP and SMV CP sequences, respectively that have been inserted into the cassette 
insertion sites. S = SMV coat protein. 

15 Figure 4B shows an autoradiograph of an SDS-PAGE gel with in vitro translation 
products obtained from pPROl .SNT and pPROl .TNS vectors. Translation reactions were 
programmed with no RNA added (lane 0); with RNA transcribed in vitro from pPROl 
(lane 1); pPROl. SNT (lane 2); and pPROl.TNS (lane 3). The molecular mass (in kDa), 
positions of the major proteins translated from BMV RNAs, and the positions of the TEV 

20 NIa, SMV CP and TMV CP are indicated. 

Figure 5 shows the results of in vitro translation of RNAs transcribed from a pPROl 
vector containing SMV CP and uidA (p -glucuronidase, GUS) coding sequences in the 
two insertion sites. 



u - man., uuj enzyme. 

5 Figure 5B shows an autoradiograph of an SDS-PAGE gel with in vitro translation 
products obtained from cassette vector pPROl.SNG. Positions of TEV NIa, GUS, and 
SMV CP proteins are indicated. Translation reactions were programmed with no RNA 
added (lane 0); and with RNA transcribed in vitro either from pPROl (lane 1); or 
pPROl.SNG (lane 2). Molecular mass (indicated in kDa), and positions of proteins 
1 0 translated from BMV RNAs is indicated: TEV NIa, GUS, and SMV CP proteins are also 
indicated. A black arrowhead indicates the position of a 1 10 kDa polypeptide present in 
small amounts. 

Figure 5C shows a photograph of an SDS PAGE gel used in a time course in vitro 
translation reaction with vector pPROl.SNG. Samples were withdrawn at times (in 
1 5 minutes) indicated at the top of each lane. At an incubation time of 1 5 minutes on SDS- 
PAGE, no 1 49 kDa precursor polypeptide could be detected. 
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DETAILED DESCRIPTION OF THE INVENTION 

In TEV, the NIa protease is synthesized as part of the polyprotein that results from the 
translation of the TEV genome. The genomic sequence of TEV, first disclosed by R. 
Allison, et al (Virology, 154:9-20, 1986) is publicly available from EMBL and Genebank 
5 database under accession number Ml 5239. NIa recognizes and cleaves specific 
sequences of seven amino acids (heptapeptide) contained in the polyprotein and is 
responsible for partial processing of the viral polyprotein. Heptapeptide cleavage 
sequences recognized by the NIa from TEV (immediately 5 -prime and 3 -prime) have 
been shown to be Glu-X-X-Tyr-X-Gln-Gly (SEQUENCE I.D. NO. 1) or Glu-X-X-Tyr-X- 

1 0 Gln-Ser (SEQUENCE I.D. NO. 2) wherein X can be any amino acid (J. Carrington, et aL, 
1988, supra and W. Dougherty, et ai, supra). Cleavage location by TEV-NIa protease 
is after the Glu amino acid. In one embodiment of the present invention, the self- 
recognized cleavage sequence at the N-terminini of the NIa protease is Glu-Pro-Val-Tyr- 
Phe-Gln-Gly (SEQUENCE I.D. NO. 3) and the self-recognized cleavage sequence at the 

15 C-termini is Glu-Leu-Val-Tyr-Ser-Gln-Gly (SEQUENCE I.D. NO. 4). These two 
heptapeptides are the ones that bracket the NIa protein in the TEV polyprotein. 

NIa releases itself from the polyprotein in an autoproteolytic reaction attacking at the 
cleavage sequences (J. Carrington, et al, Virology, 1£Q:3 55-362, 1987), and is active both 
in cis 9 processing polypeptides in which it is included, and in trans, simultaneously 

20 cleaving different polypeptides. The cis protease activity of NIa has been assayed with 
different TEV polyproteins produced in vitro which contained NIa and either naturally 
occurring or mutated versions of the cleavage sequence (J. Carrington, et al, J. Virology, 
1988, 1987, supra). Protease activity in trans has been observed in many studies using 
as substrates TEV polyproteins that were labeled in vitro and incubated with NIa 

25 extracted from infected plants. 



more different proteins in equimolar amounts. For instance, cassette vector, named 
pPROl, shown in Figure 1, was obtained by PCR amplification using as template a full 
5 length TEV cloned cDNA. It comprises PRO 1 (SEQUENCE ID NO. 5), which includes 
an open reading frame encompassing the NIa sequence (TEV nucleotides 5673 to 6983 
as numbered in R. Allison, et al, Virology, 154:9-20, 1986) as well as the target 
heptapeptides located at its N-terminus (SEQUENCE ID NO. 3) and C-terminus 
(SEQUENCE ID NO. 4). The TEV-NIa based cassette described herein also provides 
10 at least two blunt end restriction sites, preferably unique, that allow the in frame insertion 
of heterologous protein sequences vector for expression as part of a self-processing 
polypeptide. As used herein the term "heterologous" shall have the meaning that the gene 
inserted into the cassette insertion site is not native to TEV. 

For instance, in pPROl one insertion site is provided by a Sma 1 restriction enzyme site 
15 at the N-terrninus of the TEV NIa sequence, and the other insertion site is provided by a 
Stu I restriction enzyme site at the C-terrninus. In addition, the cassette optionally 
provides a start codon, preferably ATG, and a stop codon, preferably TGA, engineered 
upstream of the 5-prime site and downstream of the 3-prime site, respectively. For 
instance, in vector pPROl, which provides two insertion sites, an ATG start codon is 
20 upstream of the Sma I site, and' a TGA stop codon is downstream of the Stu I site. In 

ciuQlllOn, ulv HZfY'lNla DaowU VCwLUiO UCiCill JJrtlCl aUl/ LLiUlUUw UjJjU Colli \Ji uiw- vpv^n 

reading frame the 144 nucleotide 5' non-translated region from TEV RNA, which has 
been shown to enhance translation in vitro and in vivo (J. Carrington and D. Freed, J. 
Virol, £4:1590-1597, 1990). 

25 One skilled in the art will appreciate that the techniques described herein could be used 
to insert more than two unique restriction endonuclease sites and heptapeptide recognition 
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•uences into the expression cassette, so as to express more than two heterologous 
"ins. Thus, the number of foreign proteins translated as part of a NIa-containing 
poiyprotein is not, theoretically, limited to two, and embodiments of the cassette vector 
are contemplated within the scope of this invention wherein more than two insertion sites 
are useful for simultaneous expression of more than two proteins in equimolar amounts. 
In the embodiment of the invention utilizing more than one restriction site on one or both 
sides of the gene encoding the NIa protease and its flanking self-recognition sequences, 
it will be necessary to provide additional NIa protease self-recognition sequences 
between adjacent recognition sequences to allow for post translational self-cleavage by 
the NIa protease. A single protease is sufficient to cleave multiple sites within the single 
polypeptide produced from expression of the cassette. 

PROl (Figure IB; SEQUENCE ID NO. 6) was sequenced using techniques known in the 
art, and six mutations from the native sequence previously published for TEV were found. 
These changes were, according to numbering in Allison, supra, GC to CG at nucleotide 
5768-5769, A to G at nucleotide 5773, A to G at nucleotide 6235, T to C at nucleotide 
63 14, and A to G at nucleotide 696 1 . The mutations were left unmodified as they did not 
affect the protease activity of NIa as shown by the results presented herein. 

The cassette expression vectors presented herein, which exploit the proteolytic processing 
strategy of the TEV NIa protease, possess the advantages particular to the TEV NIa 
protease. First, NIa is a highly specific proteinase whose cleavage sequence has been well 
characterized (Carrington, et al., 1988; Dougherty, et al, 1988, supra; W. Dougherty, et 
al, Virology, 111:356-364, 1989; Dougherty, et al, Virology, 112:145-155, 1989). 
Second, NIa retains activity in vitro when cleavage sequences are inserted into several 
locations in TEV polyproteins (Carrington, et al, 1988, supra; Dougherty, et al, 1988, 
supra) or into non- viral proteins (Parks, et al, 1 Gen. Virol, 21:77 5-7 '83, 1992). Finally, 



in one Clliuuumjcui ui uic i& v -iiia-uaotu bAjJicsaiim uaaa&u^ vwuiajio piuviutu utitm, 

the NIa protease functions in vitro to cleave polypeptides containing inserted coding 
5 sequences for many different polypeptides ranging in size from 1 to as many as about 800 
amino acids. In most of the constructs tested, cleavage was so effective that non- 
processed precursors could not be detected. In only two cases (an illustration is shown 
with pPR01,SNG in Example 4) were minimal amounts of non-cleaved precursors 
detected, indicating a lack of complete processing. These in vitro results suggest utility 
10 of this approach for in vivo applications as well wherein the vectors are introduced into 
suitable plants by electroporation into plant protoplasts using methods well known in the 
art. (See for instance, Current Protocols in Molecular Biology, Ed. by F.M. Ausubel, 
Current Protocols, Vol. 1, §9.3.2-3, 1993). Transformed protoplasts can be harvested and 
grown into full transgenic plants (C. A. Rhodes, et al, Science 24&204-207, 1988). 

1 5 In alternative embodiments, NIa-based expression cassette vectors are used in systems 
other than those involving plant cells. In general, the expression cassette of this invention 
can be used in any system in which the NIa protease has activity, for example, insect 
bacteria, mammalian, and other eukaryotic cells if operatively linked to suitable 
expression control elements such as a promoter, and a polyadenylation sequence, so as 
20 to bring about replication of the attached segment in a vector suitable for the type of cell 
line selected. However, for prokaryotic cells it may be necessary to reengineer the vector 
to bias it for codon specific organisms (see C.J Noren, et al, Science, 244:182, 1989). 
For example, as is well known, Bacillus spp. generally prefer more AJT rich nucleotide 
sequences. 
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The choice of vector to which a cassette of this invention is operatively linked depends 
directly, as is well known in the art, on the host cell to be transformed and the functional 
properties desired, e.g., vector replication and protein expression, these being limitations 
inherent in the art of constructing recombinant molecules. The vector itself may be of any 
5 suitable type, such as a viral vector (RNA or DNA), naked straight-chain or circular 
DNA, or a vesicle or envelope containing the nucleic acid material to be inserted into the 
cell. Techniques for construction of lipid vesicles, such as liposomes, are well known. 
Such liposomes may be targeted to particular cells using other conventional techniques, 
such as providing an antibody or other specific binding molecule on the exterior of the 
10 liposome (see, e.g., A. Huang, et al, J. Biol Chem., 255:8015-8018, 1980). In one 
embodiment of the invention, transient expression is contemplated wherein expression 
of the polypeptide is driven either by conventional transcriptional promoters or by plant 
viral vectors. In another embodiment, the TEV-NIa based cassette vector is used in 
prokaryotic systems since NIa proteases from different potyvirus have been shown to be 
15 active when expressed in bacterial cells (Garcia, et al, Virology, 170:362-369, 1989; 
Vance, et al, Virology, 121:19-30, 1992). The TEV NIa based expression vector can be 
advantageously used, therefore, whenever it is desirable to achieve equimolar production 
of two peptides in bacterial expression systems by inserting the NIa cassette into a 
bacterial expression vector, such as members of the pUC vector family. Other insect and 
20 animal cells known in the art to be useful in expression of recombinant proteins can also 
be used. For instance, the cassette vectors can be used in production of recombinant 
antibodies wherein it is desirable to achieve equimolar amounts of the heavy and light 
chains. In another embodiment, the cassette vectors provided herein are used to produce 
molecules that spontaneously assemble a two subunit complex, such as an enzyme. In yet 
25 another embodiment, a vector having more than two insertion sites is used to express 
multimers of any type. 



target heptapeptide and the cloning strategy used. The schematic diagram of Figure 1C 
illustrates the amino acid additions at N- and C-termini that result when in the proteins 
5 (open boxes) are cloned at either Sma I (Sm) or Stu I (ST) insertion sites of pPRO 1 . The 
amino acid represented by TC' will depend on the restriction site used for cloning. In some 
cases one or more of the extraneous amino acids can be incorporated into the protein 
because it is already native to its sequence and would not have to be engineered in. 

Due to the inclusion of additional amino acids at both termini of the cloned peptides, the 
10 biological activity of some proteins expressed in this system may be affected. However, 
one skilled in the art will know how to purify the produced proteins and treat them to clip 
off the extraneous residues. For instance, as shown in Figure 1C, the heterogenous 
proteins after cleavage by the protease can have among the extraneous terminal amino 
acids an undefined amino acid (represented by "X") immediately next thereto at either end. 
15 If 'X' is selected to be a methionine and the produced peptide contains no other 
methionines, the peptide can readily be treated with cyanogen bromide to remove the 
extraneous residues. For example, the coat protein of TMV, which contains no 
methionines, can be expressed in one or both of the insertion sites, purified, and then can 
be treated with cyanogen bromide to provide the coat protein sequence free of extraneous 
20 terminal residues. One skilled in l the art will be able to similarly utilize enzymes that 

cleave peptide s between two particular residues to clip off t he terminal extraneous 

residues from product heterogeneous peptides. 

Several practical applications of the NIa cassette expression vectors utilizing its 
expression in plants as a transgene are also contemplated herein. For instance, coat 
25 protein mediated resistance (CPMR) to viral infections can generally be obtained only 
against viruses of the same taxonomic group as the one whose coat protein was used as 
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the vaccine (Fitchen & Besichy,Annu. Rev. Microbiol., 47:739-763, 1993). To engineer 
coat protein mediated resistance (CPMR) against viruses that belong to different 
taxonomic groups, sequences encoding two or more viral coat proteins from different 
taxonomic groups can be inserted into insertion sites of a NIa-based vector having two 
or more insertion sites. Alternatively, an insect resistance gene can be combined with a 
vims resistance gene. In an alternative embodiment, the vector of this invention can be 
used to express a selectable marker plus any other gene encoding a protein of the size 
contemplated herein. 

In yet another embodiment of this invention, described in full detail in U. S. Patent 
Application Serial No. 08/1 92,477 cofiled herewith, and incorporated herein by reference, 
the vector into which the cassette is ligated is a modification of the "infectious cDNA 
clone" of the tobacco mosaic virus to which is operably linked the promote of the T7 
polymerase. Highly infectious RNA transcripts of a full-length cD*\A of the 
Ul (common) strain of TMV have been produced in vitro using bacteriophage T7 RNA 
polymerase (Dawson, et al, Proc. Natl Acad Sci USA, £2:1832-1836, 1986; Meshi, et 
al, Proc. Natl Acad Sci. USA, 41:5043-5047, 1986). Alternatively, when inoculated into 
tobacco plants and other suitable host plants, this transcript causes systemic viral 
infection. Therefore, the vector of this invention can also be used to simultaneously 
provide systemic resistance to insect and virus in plants when inserted into the infectious 
cDNA clone of TMV; 

In this embodiment of the invention, to accommodate the cassette to be inserted therein, 
the cDNA encoding the TMV movement protein is deleted from the TMV infectious 
clone, and the NIa-based cassette is ligated in its place, thereby creating a modified viral 
vector. Nucleotide sequences encoding heterologous peptides ligated into the insertion 
sites of the NIa-based cassette contained within the modified infectious clone can be 
inoculated into host plants for expression therein. Therefore, in this embodiment of the 



into the insertion sites of the NIa-based cassette in the infectious clone vector for 
production in the host plant. Since the modified infectious clone vector retains the native 
5 gene encoding the coat protein of TMV, a cassette with two insertion sites can be used to 
express multiple CP sequences confer CPMR against viruses from three different 
taxonomic groups. If recombinant plants transformed with a gene encoding the wild type 
movement protein of the TMV, such as plant line 277 (Deom, et al, Cell, 62:22 1-224, 
1992) are inoculated with the modified infectious clone vector, the viral infection will 

10 spread systemically. This modified infectious clone vector takes advantage of the 
extremely high level of expression characteristic of the viral system, and can be used to 
economically produce large amounts of polypeptides, virions suitable for use as vaccines, 
etc. One skilled in the art will appreciate that such product polypeptides and/or virions 
can be purified from plant leaves using standard methods (Bruening, et al, Virology, 

15 21:498-517, 1976). 

In initial experiments, constructs containing NIa and the CP of TMV (Figure 3 A) were 
introduced in Nicotiana tabacvm via Agrobacterium tumefaciens transformation. 
Preliminary data indicate that TMV CP expressed in vivo as part of pPROl confers 
CPMR (data not shown). Additional constructs with an insert that encodes a viral coat 
20 protein and a gene encoding p-glucuronidase will enable use of GUS activity as a probe 

for me 4e^s^xpression,of4he CP. SinceJhe activity_of Jhe_CP is destroyed if the 

protease does not cleave in the exact place anticipated, this experiment showed the 
specificty of the NIa protease for cleaving multiple exogenous peptides. This approach 
will be useful for studying those examples in which there is poor correlation between the 
25 levels of CP accumulation and the degree of plant viral resistance, providing additional 
important data on the molecular mechanism(s) of CPMR in these cases. 



95/21249 



PCT/TJS95/0r495 



-17- 

The following examples illustrate the manner in which the invention can be practiced. 
It is understood, however, that the examples are for the purpose of illustration and the 
invention is not to be regarded as limited to any of the specific materials or conditions 
therein. 

EXAMPLE 1 
CONSTRUCTION OF dPROI VECTORS 

Recombinant DNA manipulation and E. coli transformation were carried out according 
to existing protocols (Sambrook, et aL, Molecular Cloning: A Laboratory Manual, Cold 
Spring Harbor Laboratory, Cold Spring Harbor, New York, 1 989). The DNA inserts used 
for the assembly of the different constructs were obtained by the polymerase chain 
reaction (PCR) using equipment and techniques provided by Perkin Elmer Cetus 
(Emoryville, CA). The sequences of primers used for amplification are detailed in Table 
1, the prefix indicating the gene to which they are targeted. 

The expression cassette vector pPROl (Figures 1 A and IB) was assembled in pBluescript 
II KS (+) (Stratagene, San Diego, CA) under the transcriptional control of a T7 promoter 
by directional insertion of PRO 1 (SEQUENCE ID NO. 5) at the Sac I - EcoR I sites of 
the multiple cloning site, rendering pPRO 1 . NIa and S'-non-translated (5-NTR) sequences 
from TEV were obtained by PCR using as DNA template a full length TEV cDNA clone 
(kindly provided by Dr. J. Canington, Texas A&M University). Oligonucleotide primers 
for amplification of NIa were TEVNIA.N and TEVNIA.C (SEQUENCE ID NOS. 7 and 
8, respectively). These two primers amplified the NIa open reading frame (Figure IB) 
plus the sequences encoding the two specific heptapeptide cleavage sequences located at 
each end of NIa in the TEV genome and contained, in addition, either Xba I and Sma I 
(TEVNIA.N) or Stu I and EcoR I (TEVNIA.C) restriction enzyme sites. The PCR 
product was directionally inserted pBluescript using Xba I and EcoR I to yield vector 



primers contained either Sac I and Bgl II (TEVNTR.5) or Sma I (TEVNTR.3) restriction 
enzyme cleavage sites. The final step in the assembly of pPROl was a Sac \-Sma I 
directed insertion of the TEV-5 NTR resulting from the PCR reaction into vector 
pBCNIa. Mutagenesis at the heptapeptides in the TEV sequence encoding the protease 
cleavage recognition sites was accomplished with primers TEVNIA.N2 and TEVNIA.C3 
(SEQUENCE ID NOS. 11 and 12, respectively) which contained either one or two 
nucleotide changes (when compared to TEVNIA.N and TEVNIA.C, respectively) that 
mutated the glutamine located at position -1 (relative to the cleavage site) to histidine to 
introduce an Nco I insertion site useful for recovering the recombinant clones from the 
cloning vector pBCNIa. 

The cDNAs for different open reading frames (ORFs) encoding heterogenous peptides 
inserted into pPROl included those encoding tobacco mosaic virus (TMV) and soybean 
mosaic virus (SMV) coat proteins (CP), as well as the uidA gene encoding the p- 
glucuronidase (GUS) activity from E. coli. These ORFs were obtained by PCR using as 
template publicly available nucleotide sequences. The nucleotide sequence of tobacco 
mosaic virus RNA, first published by P. Goelet, et al. (Proc. Natl Acad. Set U.S.A., 
22:5818-5822, 1982) is publicly available from EMBL and Genebank databases under 
Accession Numbers V01408 and J02415. The nucleotide sequence of the CP gene of 
soybean-mosaic4drus,-fir^ 

1860, 1989, is available from EMBL and Genebank databases under Accession Number 
D00507. The gene encoding GUS, first disclosed by R. A. Jefferson, et al, (Proc. Natl. 
Acad. Sci. U.S.A., 51:8447-8451, 1986) and available from EMBL and Genebank 
databases under Accession Number M14641, was obtained from Clontech. For PCR to 
obtain the ORF of TMV CP, primers TMV CP 51 (SEQUENCE ID NO. 13 was used at 
the 5' end and TMV CP 3 1 (SEQUENCE ID NO. 14) was used at the 3' end. For PCR 
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to obtain the ORF of SMV CP, primer SMV CP Nl (SEQUENCE ID NO. 15) was used 
at the 56" end and primer SMV CP C2 (SEQUENCE ID NO. 16) was used at the 3' end. 
For PCR to obtain the ORF of GUS, primer GUS N2 (SEQUENCE ID NO. 1 8) was used 
at the 5' end and primer GUS CI (SEQUENCE ID NO. 19) was used at the 3' end. 



10 
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SEQUENCES OF THE OLIGONUCLEOTIDE PRIMERS USED 

TEVNIA.N 5'-GC TCTAGA CCCGGG GAACCAGTCTATTTCCAAGGG-3' (SEQ. ID NO. 7) 
TEVNIA.C S'-G CGAATTC A AfifiCCT CCC TTTfir.fi AGTACACC A ATTCA-3' (SEQ. ID NO. 8) 
TEVNTR.5 S'-GC CGAGCTC AGATCT AA ATAACAAATCTCAACACAACA-3' (SEQ. ID NO. 9) 
TEVNTR.3 V-TrrccCfiGG CATGGCTAT CGTTCGTAAATGG-3' (SEQ. ID NO. 10) 

TEVNIA.N2b S'-TGG CCCGGG GAACCAGTCTATTTCCATGGG -3' (SEQ. ID NO. 1 1 ) 

TEVNIA.C3* 5'-G CGAATTC A AGGCCT CCCATGGGAGTAC ACCAATTCA-3 (SEQ. ID NO. 12) 



TMVCP.5 1 5--A AAGGCCT TCTTACAGTATCACTACTCC -3' (SEQ. ID NO. 13) 

TMVCP.3 1 S'-AGG CCCGGG AGTTGCAGGA CCAGAGGTCC-3' (SEQ. ID NO. 14) 

SMVCP.N 1 5 '-A AA GGCCT TCAGGCAAGGAGAAGG- 3' (SEQ. ID NO. 1 5) 

SMVCP.C2 5'-AGGC£C£Q2 CTGCGGTGGGCCCATGC -3' (SEQ. ED NO. 1 6) 

15 GUS.N2 V-AAAfifi(T?T GTAGAAACCCCAACCCG-3' (SEQ. ID NO. 17) 

GUS.C1 S'-C GG AATTC TCATTGTTTGCCTCCCTGCTG- 3' (SEQ. ED NO. 18) 

• Nucleotides annealing to the target genes are underlined with a single line, whereas 
nucleotides corresponding to the restriction enzyme recognition sequences are doubly 
underlined. 

20 b Nucleotides changed in TEVNIA.N2 and TEVN1A.C3, when compared with 
TEVNIA.N. and TEVNIA.C, respectively, are marked by an asterisk underneath. 
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PCR products corresponding to SMV- and TMV-CP genes were digested with Stu I and 
Sma I and inserted either at the Sma I or the Stu I sites of pPROl (Figure 1 ), depending 
on the construct The PCR product corresponding to the uidA ORF was digested with Stu 
I and EcoR I and inserted at the C terminus of NIa in pPROl . 

5 EXAMPLE 2 

IN VITRO TRANSCRIPTION AND TRANSLATION 

One ng of plasmid pPROl DNA containing the inserted heterologous ORFs purified from 
E. coli through QIAprep mini columns (Qiagen, Chatsworth, CA) was first linearized with 
Sal I (which cleaves downstream of pPROl), and subsequently transcribed in vitro with 

10 T7 RNA polymerase (Epicentre Technologies, Madison, WI). Size and integrity of 
transcribed mRNA were confirmed by agarose gel electrophoresis. Approximately one 
jig of mRNA was used to program in vitro translation in 25 [iL volume reactions using 
a nuclease treated rabbit reticulocyte lysate system (Promega, Madison, WI) according 
to the manufacturer's protocol. Proteins were synthesized in a nuclease treated rabbit 

1 5 reticulocyte lysate in the presence of 35 S-Met and then analyzed by SDS-PAGE (12.5% 
polyacrylamide) and autoradiography. However, since TMV CP contains no methionine 
residues, 3 H-Leu was used when the TMV CP ORF was translated in vitro. Proteins 
translated in vitro were analyzed by autoradiography following SDS-PAGE according to 
the method of U. Laemmli (Nature, [London] 22Z:680-685, 1970). 

20 As shown in Figure 2, upon in vitro transcription and subsequent in vitro translation in 
the presence of 35 S-Met, pPROl gave the expected translated peptide of approximately 
49 kDa. Experimental results demonstrate that this protein corresponded to NIa since it 
exhibited the proper proteolytic activity when expressed in pPROl as part of a 
polyprotein. 
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Other minor bands were also detected, some of which could be due to the autoproteolysis 
that releases the VPg (the protein linked to the 5' end of the viral RNA) from the protease 
domain in Nla during post-translational processing of TEV as described in W. Dougherty, 
et al (Virology, 152:449-456, 1991). 

Construction of Vectors Expressing TMV CP 

To confirm that pPRO 1 encodes Nla protease activity, several constructs were engineered 
in which the CP ORF from tobacco mosaic tobamovirus (TMV) was inserted into the 
cassette vector provided herein. These constructs are shown schematically in Figure 3 A. 
The first two constructs, pPROLNT and pPROl.TN, contained the TMV CP sequence 
in the C-terrninal or N-terrninal cloning sites, respectively. To demonstrate that 
processing of the resultant polyprotein was due to recognition and cleavage of the specific 
heptapeptides by the Nla protease and not to non-specific degradation, two additional 
controls were designed. First, the C-terminal Nla protease domain was removed with a 
frameshift mutation at the unique Bamffl site, resulting in pPROITaN (Figure 3 A). In 
this construct, processing is not expected despite the presence of the naturally occurring 
cleavage sequence. Second, using methods described in Example 1, the two target 
heptapeptides were mutated to include a Gin to His change at the -1 position. This 
mutation at the cleavage site has been previously shown to inhibit the specific processing 
by Nla in TEV (Dougherty, et al, 1988, supra; Dougherty, et al, 1989, supra). The 
resulting mutant cassette vector was named pPR04 and the corresponding pPR04.NT and 
pPR04.TN were also constructed as shown in Figure 3 A. 

In vitro transcription and translation of TMV CP-containing constructs in the above 
described rabbit reticulocyte lysate in the presence of 3 H-Leu, upon analysis by SDS- 
PAGE (15% polyacrylamide) and fluorography, revealed the expected patterns and sizes 
of labeled proteins as shown in Figure 3B. In addition to the 49 kDa protein, a band 
corresponding to a protein of approximately 18 kDa was detected in pPROl.NT and 
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pPROl.TN. 18 kDa is the expected size of TMV CP when expressed in pPROl 
constructs. The CP produced from pPROl.TN was slightly larger than that produced 
from pPROl.NT, in accordance with the numbers of amino acid residues added when the 
cDNA was cloned at the Sma I site versus the Stu I site (see Figure 1C). On the other 
5 hand, the major proteins resulting from constructs pPR04. \T and pPR04.TN migrated 
at positions corresponding to the size of the precursor polypeptide containing NIa plus 
TMV CP (68 kDa). Finally, when the protease domain from NIa was absent 
(pPROl .TaN) a single protein of about 28 kDa, corresponding to the truncated protein, 
was detected. 

1 0 Results of the in vitro translation followed by immunoprecipitation analyses of these 
vectors are shown in Figure 3C respectively. Immunoprecipitation assays were based 
upon previously described protocols with minor modifications. Briefly, 20 |iL aliquots 
of in vitro translation reactions were diluted to 100 \±L with TBSN (25 mM Tris-HCl pH 
7.5, 1 50 mM NaCl, 1% Nonidet P-40) and pre-incubated with protein A Sepharose beads 

1 5 (Sigma, St. Louis, MO) for 1 5 minutes on ice. After removing the beads, one jaL was 
added of an appropriate dilution of a polyclonal antibody raised against TMV CP 
(ATCC# PVAS - 135) by standard techniques well known in the art . The mixture was 
incubated for 2-4 hours at 4°C with slow shaking. Subsequently, protein A Sepharose 
beads previously blocked with rabbit reticulocyte lysate were added and the mixture was 

20 kept on ice for 15 minutes with occasional shaking. The Sepharose beads were recovered 
and washed twice with 0.5* M LiCl, 20 niM Tris-HCl pH 8, once with TBSN, and once 
with H 2 0. Finally, beads containing immunoprecipitated labeled proteins were 
resuspended in SDS-PAGE loading buffer and the proteins were analyzed as described 
above. 

25 Immunoprecipitation reactions of the proteins produced in vitro using an anti-TMV CP 
antibody resulted in precipitation of the expected proteins (Figure 3C). Only those 
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peptides which included TMV CP sequences were selectively immunoprecipitated, 
whereas the 49 kDa NIa protein was not. These data clearly demonstrate that pPROl 
functions as predicted. 

Several experiments were carried out to determine whether or not proteolytic processing 
5 could occur in trans. The labeled peptide that was translated from pPROl .TaN was not 
processed when non-labeled 49 kDa protein translated from pPROl was used as source 
of NIa proteinase (data not shown). This result is in agreement with previously reported 
data. (J. Carrington and W. Dougherty, 1987, supra). 



EXAMPLE 3 

10 PROTEOLYT IC PROCESSING OF TWO 

DIFFERENT PROTEINS INTRODUCED IN pPROl 

pPROl was further tested with the introduction of coding sequences for two different 
heterologous proteins into the two insertion sites. ORFs encoding coat proteins from 
viruses belonging to different groups, SMV (s; potyvirus) and TMV (T), were inserted 

15 to create constructions having the heterologous ORFs in the two possible positions. 
Figure 4A shows the resulting constructs pPROLSNT and pPROl.TNS. As shown in 
Figure 4B, in vitro transcription and translation of these two constructs gave the predicted 
patterns of labeled proteins, resulting in the accumulation of proteins with the expected 
sizes of the NIa ( 4 9 k Ba) r SMY~CP^ound^0^ 1 8 kD aX-A&. 

20 expected, the coat proteins inserted at the Sma I site of pPRO 1 gave slightly larger mature 
proteins than those inserted at the Stu I site due to incorporation of extra peptides as 
described in Figure 1C. Moreover, the more rapidly migrating proteins (predicted to be 
the TMV CP) co-migrated with proteins recovered following immunoprecipitation with 
anti-TMV CP antibody as in Example 2 above. 
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EXAMPLE 4 
PROTEOLYTIC PR OCESSING OF TWO OPEN 
REAPING FRAMES FROM UNRELATED PROTEINS 



Another construct, pPROl.SNG shown in Figure 5 A, consisted of the SMV CP 
5 positioned at the Sma I insertion site of pPROl and the open reading frame encoding the 
p-glucuronidase activity (GUS) at the Stu I insertion site of pPROl . As shown in Figure 
5B, following in vitro translation in the presence of 35 S-Met, the expected profile of 
mature proteins was generated. The polypeptide synthesized upon translation of this 
construct has a predicted size of about 149 kDa, and is the largest that has been tested 
1 0 with the pPRO 1 expression cassette. In this particular case, a high molecular weight band 
corresponding to a polypeptide of approximately 110 kDa was present in relatively low 
amounts. This protein probably corresponds to a fusion of the NIa and GUS peptides, 
implying that processing was not complete. 

A time course in vitro translation reaction programmed with construct pPROl .SNG and 
1 5 having samples withdrawn at the 5, 1 0, 1 5, 20, 30, 45, 60, and 90 minute intervals showed 
the predicted increase in the accumulation of the expected proteins with time as analyzed 
by SDS-PAGE (10% polyacrylamides) and autoradiography (Figure 5C). Even at short 
incubation times (15 min), no 149 kDa precursor could be detected, indicating efficient 
co-translational processing. However, pulse chase experiments with this construct did not 
20 demonstrate significant post translational processing of the low amounts of 1 1 0 kDa 
polypeptide (data not shown). 

The foregoing description of the invention is exemplary for purposes of illustration and 
explanation. It should be understood that various modifications can be made without 
departing from the spirit and scope of the invention. Accordingly, the following claims 
25 are intended to be interpreted to embrace all such modifications. 



Sequence ID No. 1 is an amino acid sequence for the consensus heptapeptide cleavage 
sequences that are cleaved by the NIa from TEV. 

Sequence ID No. 2 is an amino acid sequence for the consensus heptapeptide cleavage 
sequences that are cleaved by the NIa from TEV. 

Sequence ID No. 3 is an amino acid sequence for a self-recognized heptapeptide cleavage 
sequences at the N terminus of NIa in TEV. 

Sequence ID No. 4 is an amino acid sequence for a self-recognized heptapeptide cleavage 
sequence C terminus of NIa in TEV. 

Sequence ID No. 5 is a nucleotide sequence for PROl (Figure IB). 

Sequence ID No 6 is an amino acid sequence for PROl (Figure IB). 

Sequence ID No. 7 is a nucleotide sequence for a primer (TEVNIA.N) for amplification 
and cloning of cDNA encoding the nuclear inclusion a protein of tobacco etch potyvirus. 

Sequence IDW8~is a nucleotide"sequence^br a primer (TEWIAvC^r-amplification 
and cloning of cDNA encoding the nuclear inclusion a protein of tobacco etch potyvirus. 

Sequence ID No. 9 is a nucleotide sequence for a primer (TEVNTR.5) for amplification 
and cloning of the 5' untranslated region of tobacco etch potyvirus. 
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Sequence ID No 10 is a nucleotide sequence for a primer (TEVNTR.3) for amplification 
and cloning of the 5' untranslated region of tobacco etch potyvirus. 

Sequence ID No. 1 1 is a nucleotide sequence for a primer (TEVNIA.N2) for amplification 
and cloning of cDNA encoding the nuclear inclusion protein of tobacco etch potyvirus. 

5 Sequence ID No 12 is a nucleotide sequence for a primer (TEVNIA.C3) for amplification 
and cloning of cDNA encoding the nuclear inclusion protein of tobacco etch potyvirus. 

Sequence ID No. 13 is a nucleotide sequence for a primer (TMVCP.5 1) for amplification 
and cloning of cDNA encoding the tobacco mosaic virus coat protein. 

Sequence ID No 14 is a nucleotide sequence for a primer (TMVCP.3 1 ) for amplification 
1 0 and cloning of cDNA encoding the tobacco mosaic virus coat protein. 

Sequence ID No. 15 is a nucleotide sequence for a primer (SMVCP.N1) for amplification 
and cloning of cDNA encoding the soybean mosaic virus coat protein. 

Sequence ID No. 16 is a nucleotide sequence for a primer (SMVCP.C2) for amplification 
and cloning of cDNA encoding the soybean mosaic virus coat protein. 

1 5 Sequence ID No. 17 is a nucleotide sequence for a primer (GUS.N2) for 
amplification and cloning of cDNA encoding p -glucuronidase. 

Sequence ID No. 18 is a nucleotide sequence for a primer (GUS.C1) for amplification and 
cloning of cDNA encoding p -glucuronidase. 
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(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 
10 (B) LOCATION: 1..7 

(D) OTHER INFORMATION: /note= "where X appears, X can be 
any amino acid" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Glu Xaa Xaa Tyr Xaa Gin Gly 
15 1 5 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 
25 (B) LOCATION: 1..7 

(D) OTHER INFORMATION: /note= "where X appears, X can be 
any amino acid" 
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: ^ (xi) SEQU^^^^^SrIPTION: SEQ ID NO:2: 



Glu Xaa Xaa Tyr Xaa Gin Ser 

1 -tf 5 

(2); INFORMATION FOR SEQ ID NO : 3 : 

> (i) SEQUENCE'; CHARACTERISTICS : 
(A) LE^p5|V 7 amino acids 
(BV ^^:>^^mino acid 

(C) STI^ftNlDiSDNESS : single 

(D) TOPOLOGY: linear 



10 (ii) MOLECULE" TYPE: peptide 



(ix) FEATURE: 

(A) NAME /KEY : Peptide 

(B) LOCATION: 1. .7 .. 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 : 

1 5 Glu Pro Val, Tyr Phe Gin Gly 

1 5 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 7 amino acids 
20 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 



(D) TOCOLOGY: linear 
{ ii ) MOLECULE TYPE : peptide 



(ix) FEATURE: 
25 (A) NAME/KEY: Peptide 

(B) LOCATION: 1. .7 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 j 

Glu Leu Val Tyr Ser Gin Gly 
1 5 

(2) INFORMATION FOR SEQ ID NO: 5: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 8 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: PROl 

(ix) FEATURE: 

(A) NAME/KEY: CDS 
15 (B) LOCATION: 156.. 1481 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GAGCTCAGAT CTAAATAACA AATCTCAACA CAACATATAC AAAACAAACG AATCTCAAGC 

AATCAAG CAT TCTACTTCTA TTGCAGCAAT TTAAATCATT TCTTTTAAAG CAAAAGCAAT 

TTTCTGAAAA TTTTCAC CAT TTACGAACGA TAGCC ATG CCC GGG GAA CCA GTC 
20 Met Pro Gly Glu Pro Val 

1 5 

TAT TTC CAA GGG AAG AAG AAT CAG AAG CAC AAG CTT AAG ATG AGA GAG 
Tyr Phe Gin Gly Lys Lys Asn Gin Lys His Lys Leu Lys Met Arg Glu 
10 15 20 



25 



GCG CGT GGG GCT AGA GGG CAA TAT GAG GTT GCA GCG GAC GCA GGG GCG 
Ala Arg Gly Ala Arg Gly Gin Tyr Glu Val Ala Ala Asp Ala Gly Ala 
25 30 35 
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CTA GAA CAT TAC TTT GGA AGC GCA TAT AAT AAC AAA GGA AAG CGC AAG 317 
Leu Glu His Tyr Phe Gly Ser Ala Tyr Asn Asn Lys Gly Lys Arg Lys 
40 45 so 

GGC ACC ACG AGA GGA ATG GGT GCA AAG TCT CGG AAA TTC ATA AAC ATG 365 
5 Gly Thr Thr Arg Gly Met Gly Ala Lys Ser Arg Lys Phe He Asn Met 

55 60 65 70 

TAT GGG TTT GAT CCA ACT GAT TTT TCA TAC ATT AGG TTT GTG GAT CCA 413 
Tyr Gly Phe Asp Pro Thr Asp Phe Ser Tyr He Arg Phe Val Asp Pro 
75 80 85 

10 TTG ACA GGT CAC ACT ATT GAT GAG TCC ACA AAC GCA CCT ATT GAT TTA 461 

Leu Thr Gly His Thr He Asp Glu Ser Thr Asn Ala Pro He Asp Leu 
90 95 100 

GTG CAG CAT GAG TTT GGA AAG GTT AGA ACA CGC ATG TTA ATT GAC GAT 509 
Val Gin His Glu Phe Gly Lys Val Arg Thr Arg Met Leu He Asp Asp 
15 105 110 us 

GAG ATA GAG CCT CAA AGT CTT AGC ACC CAC ACC ACA ATC CAT GCT TAT 557 
Glu He Glu Pro Gin Ser Leu Ser Thr His Thr Thr He His Ala Tyr 
120 125 130 

TTG GTG AAT AGT GGC ACG AAG AAA GTT CTT AAG GTT GAT TTA ACA CCA 605 
20 Leu Val Asn Ser Gly Thr Lys Lys Val Leu Lys Val Asp Leu Thr Pro 

135 140 145 150 

CAC TCG TCG CTA CGT GCG AGT GAG AAA TCA ACA GCA ATA ATG GGA TTT 653 
His Ser Ser Leu Arg Ala Ser Glu Lys Ser Thr Ala He Met Gly Phe 
155 160 165 

-25 CCT GAA AGG GAG AAT GAA TTG CGT CAA ACC GGC ATG GCA GTG CCA GTG 

Pro Glu Arg Glu Asn Glu Leu Arg Gin Thr Gly Met Ala Val Pro Val 
170 175 180 



30 



GCT TAT GAT CAA TTG CCA CCA AAG AGT GAG GAC TTG ACG TTT GAA GGA 
Ala Tyr Asp Gin Leu Pro Pro Lys Ser Glu Asp Leu Thr Phe Glu Gly 
185 190 195 



749 



WO 95/21249 



PCT/US95/01495 



-33- 



GAA AGC TTG TTT AAG GGA CCA CGT GAT TAC AAC CCG ATA TCG AGC ACC 797 
Glu Ser Leu Phe Lys Gly Pro Arg Asp Tyr Asn Pro lie Ser Ser Thr 
200 205 210 

ATT TGT CAC TTG ACG AAT GAA TCT GAT GGG CAC ACA ACA TCG TTG TAT 845 
5 lie Cys His Leu Thr Asn Glu Ser Asp Gly His Thr Thr Ser Leu Tyr 

215 220 225 230 

GGT ATT GGA TTT GGT CCC TTC ATC ATT ACA AAC AAG w,C TTG TTT AGA 8 93 

Gly lie Gly Phe Gly Pro Phe lie lie Thr Asn Lys 3 Leu Phe Arg 
235 240 245 

10 AGA AAT AAT GGA ACA CTG TTG GTC CAA TCA CTA CAT GGT GTA TTC AAG 941 

Arg Asn Asn Gly Thr Leu Leu Val Gin Ser Leu His Gly Val Phe Lys 
250 255 260 

GTC AAG AAC ACC ACG ACT TTG CAA CAA CAC CTC ATT GAT GGG AGG GAC 989 
Val Lys Asn Thr Thr Thr Leu Gin Gin His Leu lie Asp Gly Arg Asp 
15 265 270 275 

ATG ATA ATT ATT CGC ATG CCT AAG GAT TTC CCA CCA TTT CCT CAA AAG 103 7 

Met lie lie lie Arg Met Pro Lys Asp Phe Pro Pro Phe Pro Gin Lys 
280 285 290 

CTG AAA TTT AGA GAG CCA CAA AGG GAA GAG CGC ATA TGT CTT GTG ACA 1085 
20 Leu Lys Phe Arg Glu Pro Gin Arg Glu Glu Arg lie Cys Leu Val Thr 

295 300 305 310 

ACC AAC TTC CAA ACT AAG AGC ATG TCT AGC ATG GTG TCA GAC ACT AGT 113 3 

Thr Asn Phe Gin Thr Lys Ser Met Ser Ser Met Val Ser Asp Thr Ser 
315 320 325 

25 TGC ACA TTC CCT TCA TCT GAT GGC ATA TTC TGG AAG CAT TGG ATT CAA 1181 

Cys Thr Phe Pro Ser Ser Asp Gly lie Phe Trp Lys His Trp lie Gin 
330 335 340 
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ACC AAG GAT GGG CAG TGT GGC AGT CCA TTA GTA TCA ACT AGA GAT GGG 
Thr Lys Asp Gly Gin Cys Gly Ser Pro Leu Val Ser Thr Arg Asp Gly 
345 350 355 
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TTC ATT GTT GGT ATA CAC TCA GCA TCG AAT TTC ACC AAC ACA AAC AAT 1277 
Phe He Val Gly He His Ser Ala Ser Asn Phe Thr Asn Thr Asn Asn 
360 365 370 

TAT TTC ACA AGC GTG CCG AAA AAC TTC ATG GAA TTG TTG ACA AAT CAG 1325 
Tyr Phe Thr Ser Val Pro Lys Asn Phe Met Glu Leu Leu Thr Asn Gin 
375 380 385 390 

GAG GCG CAG CAG TGG GTT AGT GGT TGG CGA TTA AAT GCT GAC TCA GTA 1373 
Glu Ala Gin Gin Trp Val Ser Gly Trp Arg Leu Asn- Ala Asp Ser Val 
395 400 405 

TTG TGG GGG GGC CAT AAA GTT TTC ATG AGC AAA CCT GAA GAG CCT TTT 1421 
Leu Trp Gly Gly His Lys Val Phe Met Ser Lys Pro Glu Glu Pro Phe 
410 415 420 



CAG CCA GTT AAG GAA GCG ACT CAA CTC ATG AGT GAA TTG GTG TAC TCG 1469 
Gin Pro Val Lys Glu Ala Thr Gin Leu Met Ser Glu Leu Val Tyr Ser 
15 425 430 435 

CAA GGG AGG CCT TGAATTC 1488 
Gin Gly Arg Pro 
440 



(2) INFORMATION FOR SEQ ID NO: 6: 

20 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 442 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGYi linear 



" ~~ ( i i ) MOLECULE TYPE: protein — 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Pro Gly Glu Pro Val Tyr Phe Gin Gly Lys Lys Asn Gin Lys His 
1 5 10 .15 

Lys Leu Lys Met Arg Glu Ala Arg Gly Ala Arg Gly Gin Tyr Glu Val 
20 25 30 
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Ala Ala Asp Ala Gly Ala Leu Glu His Tyr Phe Gly Ser Ala Tyr Asn 
35 40 45 

Asn Lys Gly Lys Arg Lys Gly Thr Thr Arg Gly Met Gly Ala Lys Ser 
50 55 60 

5 Arg Lys Phe lie Asn Met Tyr Gly Phe Asp Pro Thr Asp Phe Ser Tyr 

65 70 75 80 

lie Arg Phe Val Asp Pro Leu Thr Gly His Thr lie Asp Glu Ser Thr 
85 90 95 

Asn Ala Pro lie Asp Leu Val Gin His Glu Phe Gly Lys Val Arg Thr 
10 100 105 110 

Arg Met Leu lie Asp Asp Glu lie Glu Pro Gin Ser* Leu Ser Thr His 
115 120 125 

Thr Thr lie His Ala Tyr Leu Val Asn Ser Gly Thr Lys Lys Val Leu 
130 135 140 

15 Lys Val Asp Leu Thr Pro His Ser Ser Leu Arg Ala Ser Glu Lys Ser 

145 150 155 160 

Thr Ala lie Met Gly Phe Pro Glu Arg Glu Asn Glu Leu Arg Gin Thr 
165 170 175 

Gly Met Ala Val Pro Val Ala Tyr Asp Gin Leu Pro Pro Lys Ser Glu 
20 180 185 190 

Asp Leu Thr Phe Glu Gly Glu Ser Leu Phe Lys Gly Pro Arg Asp Tyr 
195 200 205 

Asn Pro lie Ser Ser Thr lie Cys His Leu Thr Asn Glu Ser Asp Gly 
210 215 220 

25 His Thr Thr Ser Leu Tyr Gly lie Gly Phe Gly Pro Phe lie lie Thr 

225 230 235 240 



Asn Lys His Leu Phe Arg Arg Asn Asn Gly Thr Leu Leu Val Gin Ser 
245 250 255 
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Leu His Gly Val Phe Lys Val Lys Asn Thr Thr Thr Leu Gin Gin His 
260 265 270 

Leu He Asp Gly Arg Asp Met He He He Arg Met Pro Lys Asp Phe 
275 280 285 

5 Pro Pro Phe Pro, Gin Lys Leu Lys Phe Arg Glu Pro Gin Arg Glu Glu 

290 295 300 

Arg He Cys Leu Val Thr Thr Asn Phe Gin Thr Lys Ser Met Ser Ser 
305 310 315 320 



10 



Met Val Ser Asp Thr Ser Cys Thr Phe Pro Ser Ser Asp Gly He Phe 
325 330 335 

Trp Lys His Trp He Gin Thr Lys Asp Gly Gin Cys Gly Ser Pro Leu 
340 345 350 

Val Ser Thr Arg Asp Gly Phe He Val Gly He His Ser Ala Ser Asn 
355 360 365 

15 Phe Thr Asn Thr Asn Asn Tyr Phe Thr Ser Val Pro Lys Asn Phe Met 

370 375 380 

Glu Leu Leu Thr Asn Gin Glu Ala Gin Gin Trp Val Ser Gly Trp Arg 
385 390 395 400 



20 



Leu Asn Ala Asp Ser Val Leu Trp Gly Gly His Lys Val Phe Met Ser 
405 410 415 

Lys Pro Glu Glu Pro Phe Gin Pro Val Lys Glu Ala Thr Gin Leu Met 
420 425 430 



Ser Glu Leu Val Tyr Ser Gin Gly Arg Pro 
435 440 
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(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: TEVNIA.N 

10 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GCTCTAGACC CGGGGAACCA GTCTATTTCC AAGGG 
15 (2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: TEVNIA.C 

(ix) FEATURE: 
25 (A) NAME/KEY: CDS 

(B) LOCATION: 1..3 7 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GCGAATTCAA GGCCTCCCTT GCGAGTACAC CAATTCA 3 7 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingl e 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



10 (vii) IMMEDIATE SOURCE: 

(B) CLONE: TEVNTR.5 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..38 



15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9 : 

GCCGAGCTCA GATCTAAATA ACAAATCTCA ACACAACA 3 8 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 31 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single — — — — — — 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



25 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: TEVNTR.3 
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( ix ) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..31 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
5 TCCCCCGGGC ATGGCTATCG TTCGTAAATG G 

(2) INFORMATION FOR* SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: TEVNIA . N2b 



15 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TGGCCCGGGG AACCAGTCTA TTTCCATGGG 
20 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(vii) IMMEDIATE SOURCE: 

(B) CLONE: TEVNIA.C3b 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..37 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 

GCGAATTCAA GGCCTCCCAT GGGAGTACAC CAATTCA 3 7 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



15 (vii) IMMEDIATE SOURCE: 

(B) CLONE: TMVCP.51 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1..28 



2e 



t xi) S E QUENCE D ES CRIPTION S SEQ ID NO ^^ 
AAAGGCCTTC TTACAGTATC ACTACTCC 



28 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: TMVCP.31 

10 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..29 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 14 : 
AGGC CCGGG A GTTGCAGGAC CAGAGGTCC 
15 (2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(vii) IMMEDIATE SOURCE: 

<B) CLONE: SMVCP.N1 

(ix) FEATURE: 
25 (A) NAME/KEY: CDS 

(B) LOCATION: 1..24 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
AAAGGCCTTC AGGCAAGGAG AAGG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



10 (vii) IMMEDIATE SOURCE: 

(B) CLONE: SMVCP.C2 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..26 



15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 16 

AGGCCCGGGC TGCGGTGGGC CCATGC 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 "base pairs 
20 (B) TYPE: nucleic acid 

— — — ( C ) S TRANDED NESS^— single 



(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 



25 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: GUS.N2 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .25 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 17: 
5 AAAGGCCTGT AGAAACCCCA ACCCG 25 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: GUS.C1 

15 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..29 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18; 



CGGAATTCTC ATTGTTTGCC TCCCTGCTG 



29 
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C L A IMS 



An expression cassette comprising: 
a nucleotide sequence encoding: 

a) the nuclear inclusion (NIa) protease from tobacco etch virus; 

b) multiple restriction endonuclease sites; and 

c) self-cleavage sites for the protease, wherein the self-cleavage sites flank 
the protease and each restriction site, except at the termini of the 
nucleotide sequence. 

An expression cassette vector comprising: 

a) a nucleotide sequence encoding: 

the nuclear inclusion (NIa) protease from tobacco etch virus; 
multiple restriction endonuclease sites; 

self-cleavage sites for the protease, wherein the self-cleavage sites flank 
the protease and each restriction site, except at the termini of the 
nucleotide sequence; and 

b) expression control elements operably linked to the nucleotide sequence. 

An expression cassette vector comprising: 

a) a nucleotide sequence encoding: 

the nuclear inclusion protein (NIa) from tobacco etch virus flanked by 
self-cleavage sequences therefor; and 

restriction endonuclease sites flanking the self-cleavage sequences; and 

b) expression control elements operably linked to the nucleotide sequence. 
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4. The vector of claim 2 wherein the nucleotide sequence further comprises: 

a) an N-terminal start codon; and 

b) a C -terminal stop codon. 

5. The vector of claim 2 wherein at least one of the cleavage sequences encodes the 
amino acid sequence Sequence ID No. 1, wherein X is any amino acid. 

6. The vector of claim 3 wherein at least one of the cleavage sequences encodes the 
amino acid sequence Sequence ID No. 2, wherein X is any amino acid. 

7. The vector of claim 6 wherein the nucleotide sequence further comprises upstream 
of the open reading frames therein the 5 ! non-translated region from TEV RNA. 

8. The vector of claim 2 wherein the N-terminus cleavage sequence encodes the 
amino acid sequence Sequence ID No. 4. 

9. The vector of claim 8 wherein the C-terminus cleavage sequence encodes the 
amino acid sequence Sequence ID No. 5. 

10. The vector of claim 2 wherein the restriction sites are blunt-ended. 

1 1 . The vector of claim 2 wherein the restriction sites are unique. 

12. The cassette of claim 1 having the nucleotide sequence of Sequence ID No. 5. 

13. The vector of claim 2 wherein one of the restriction endonuclease sites is a 
multiple restriction site. 



-46- 

14. The vector of claim 2 or 3 wherein a nucleotide sequence encoding a heterologous 
protein is inserted into each restriction endonuclease site. 

1 5. An expression cell comprising the vector of claim 2. 

1 6. An expression cell comprising the vector of claim 3. 

1 7. The expression ceil of claim 1 5 wherein the cell is a plant cell. 

1 8. The expression cell of claim 1 5 wherein the cell is a prokaryotic cell. 

19. A method for obtaining heterogeneous peptides in equimolar amounts comprising: 

a) cleaving two or more the restriction endonuclease sites with enzymes 
specific therefor; 

b) inserting DNA encoding a heterogeneous peptide into each cleaved 
restriction site; 

c) transfecting a suitable cell with the vector; 

d) culturing the transformed cell; and 

e) obtaining the heterogeneous peptides in equimolar amounts. 

20. The method of claim 19 wherein the cell is a plant cell. 

21. The method of claim 20 wherein the plant cell is a plant protoplast and the 
culturing is in vitro. 

22. The method of claim 19 wherein the cell is in a leaf of a plant and the culturing 
is in vivo. 
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23. The method of claim 19 wherein the cell is a prokaryote. 



24. The vector of claim 2 or 3 wherein the promoter is the T7 polymerase promoter 
and the vector is derived from the infectious cDNA clone of TMV. 



25. A plant cell infected with the vector of claim 24. 
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RECOGNITION SEQUENCE 
Sm _ 

i i — ^— 

Met Pro Gly Glu Pro Val Tyr Phe Gin Gly 
ATGCCCGGGGAACCAGTCTAT7TCCAAGGG 



RECOGNITION SEQUENCE 



St 
I 



Ec 



1 I ^ . 

Glu Leu Val Tyr Ser Gin Gly Arg Pro STOP 
GAATTGGTGTACTCGCAAGGGAGGCCTTGAATTC 



— ►EE 

Sc Bg 



TEV - Nla 



Ba 



Sa 



Sm: 



Pro-X- 



PROTEIN 



- X - Gly - Glu - Pro - Val - Tyr - Phe - Gin 



St: 



Gly ♦ Arg - X -j PROTEIN l-X-Pro 
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Thursday, November 18, V 2 11 PM 

PR01 Map2 (1 > 1488) Site and Sequence PaQe 1 

Enzymes : 49 of 207 enzymes (Filtered) 

Sett ^9 s: Linear, Certain & Uncertain Sites, Standard Genetic Code 

gg = 

U O (0 CO w O) oJE 

LUtucacncoGQ co w 

GAGCTCAGATCTAAATAACAAATCTCAACACAACATATACAAAACAAACGAATCTCAAGCAATCAAGC AT 

ctcgagtctag atttattgtttagagttgtgt tgIatatgttttgtttgcttagagttcgttagItcgta 

IL/ 5' LeaderMMMM....^— 



70 



(0 



TCTACTTCTATTGCAGCAATTTAAATCATTTCTTTTAAAGCAAAAGCAATTTTCTGAAAATTTTCACCAT 

agatgaagataacgtcgttaaatttagtaaagaaaatttcgttttcgttaaaagactttIaaaagtggta 140 



'TEV 5' Leaderi 



m CH 03 ^_ 

"EE g 

X 



77 



ttacgaacgatag ccatgcccggggaaccagtctatttccaagggaagaagaatcagaagcacaagctta 

AATGCTTGCTATCGG TACGGGCCCCTTGGTCAGATAAAGGTTCCCTT CTTCTTARTrTTrnTCTTrr; a at 21 ° 
-TEV 5' Leader^> | , Cleavage Seq. | TEV - Nla 

Met Pro Gly Glu Pro Vol Tyr Phe Gin Gly Lys Lys Asn Gin Lys His Lys Leu 



_ < 



CL 



£ 3* 



a> 
co 
X 



AGATGAGAGAGGCGCGTGGGGCTAGAGGGCAATATGAGGTTGCAGCGGACGCAGGGGCGCTAGAACATTA 

tctactctctccg cgcaccccgatctcccgttatactccaacgtcgcctgcgtc cccgcgatcttgtaaI 280 

TEV - Nla 

Lys Met Arg Glu Alo Arg Gly Ala Arg Gly Gin Tyr Glu Vol Ala Ala Asp Ala Gly Ala Leu Glu His Tyr 



Figure 18 
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PRQ1 Map2 (1 > 1488) Site and Sequence 

g < 

X CO 

CTTTGGAAGCGCATATAATAACAAAGGAAAGCGCAAGGGCACCACGAGAGGAATGGGTGCAAAGTCTCGG 
. 1 1 1 1 1 1 1 1 —| , 1 — , h 350 

GAAACCTTCGCGTATATTATTGTTTCCTTTCGCGTTCCCGTGGTGCTCTCCTTACCCAC GTTTCAGAGCC 

TEV - Nla 

Phe Gly Ser Ala Tyr Asn Asn Lys Gly Lys Arg Lys Gly Thr Thr Arg Gly Met Gly Ala Lys Ser Arg 
to _ X 

— CO 

m 



i i 



AAATTCATAAACATGTATGGGTTTGATCCAACTGATTTTTCATACATTAGGTTTGTGGATCCATTGACAG 
1 1 " 1 " 1 1 H 1 1 , 1 1 h 42 0 

TTTAAGTATTTGTACATACCCAAACTAGGTTGACTAAAAAGTATGTAATCCAAACACCTAGGTAACTGTC 

TEV - Nla 

Lys Phe He Asn Met Tyr Gly Phe Asp Pro Thr Asp Phe Ser Tyr lie Arg Phe Val Asp Pro Leu Thr 



GTCACACTATTGATGAGTCCACAAACGCACCTATTGATTTAGTGCAGCATGAGTTTGGAAAGGTTAGAAC 
1 1 ' 1 ' 1 ' 1 ■ 1 1 1 1 r qgo 

CAGTGTGATAACTACTCAGGTGTTTGCGTGGATAACTAAATCACGTCGTACTCAAACCTTTCCAATCTTG 

TEV - Nla 

Gly His Thr He Asp Glu Ser Thr Asn Ala Pro lie Asp Leu Val Gin His Glu Phe Gly Lys Val Arg Thr 

ACGCATGTTAATTGACGATGAGATAGAGCCTCAAAGTCTTAGCACCCACACCACAATCCATGCTTATTTG 
1 1 «- 1 1 1 1 1— i 1 1 i 1 1- 560 

TGCGTACAATTAACTGCTACTCTATCTCGGAGTTTCAGAATCGTGGGTGTGGTGTTAGG TACGAATAAAC 

TEV - Nla 

Arg Met Leu He Asp Asp Glu He Glu Pro Gin Ser Leu Ser Thr His Thr Thr He His Ala Tyr Leu 



GTGAATAGTGGCACGAAGAAAGTTCTTAAGGTTGATTTAACACCACACTCGTCGCTACGTGCGAGTGAGA 
1 1 • 1 1 1 1 1— — ■ — i i 1 1 1 (- 630 

CACTTATCACCGTGCTTCTTTCAAGAATTCCAACTAAATTGTGGTGTGAGCAGCGATGCACGCTCACTCT 

TEV - Nla 



Val Asn Ser Gly Thr Lys Lys Val Leu Lys Val Asp Leu Thr Pro His Ser Ser Leu Arg Ala Ser Glu 
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o 



AATCAACAGCAATAATGGGATTTCCTGAAAGGGAGAATGAATTGCGTCAAACCGGCATGGCAGTGCCAGT 
1 1 ' i ' 1 1 1 ' I 1 1 1 h 700 

TTAGTTGTCGTTATTACCCTAAAGGACTTTCCCTCTTACTTAACGCAGTTTGGCCGTACCGTCACGG TCA 

TEV - Nla 

Lys Ser Thr Ala He Met Gly Phe Pro Glu Arg Giu Asn Glu Leu Arg Gin Thr Gly Met Ala Val Pro Val 

UJ 
u_ 

1- = 

= 1 a 

1 I 

GGCTTATGATCAATTGCCACCAAAGAGTGAGGACTTGACGTTTGAAGGAGAAAGCTTGTTTAAGGGACCA 
' 1 ' 1 ' 1 " 1 « 1 1 1 1 1- 770 

CCGAATACTAGTTAACGGTGGTTTCTCACTCCTGAACTGCAAACTTCCTCTTTCGAACAAATTCCCTGGT 

TEV - Nla 

Ala Tyr Asp Gin Leu Pro Pro Lys Ser Glu Asp Leu Thr Phe Glu Gly Glu Ser Leu Phe Lys Gly Pro 

c u 
a. ai 



o 



CGTGATTACAACCCGATATCGAGCACCATTTGTCACT.TGACGAATGAATCTGATGGGCACACAACATCGT 
' 1 1 ■ 1 1 1 ' 1 I ' 1 ' 1 ' K 840 

GCACTAATGTTGGGCTATAGCTCGTGGTAAACAGTGAACTGCTTACTTAGACTACCCGTGTGTTGTAGCA 

TEV . Nla 

Arg Asp Tyr Asn Pro He Ser Ser Thr He Cys His Leu Thr Asn Glu Ser Asp Gly His Thr Thr Ser 

TGTATGGTATTGGATTTGGTCCCTTCATCATTACAAACAAGCACTTGTTTAGAAGAAATAATGGAACACT 
1 1 1 1 r" 1 « 1 , 1 . 1 , j. g 1Q 

ACATACCATAACCTAAACCAGGGAAGTAGTAATGTTTGTTCGTGAACAAATCTTCTTTATTACCTTGTGA 

TEV • Nla 

Leu Tyr Gly He Gly Phe Gly Pro Phe He He Thr Asn Lys His Leu Phe Arg Arg Asn Asn Gly Thr Leu 

GTTGGTCCAATCACTACATGGTGTATTCAAGGTCAAGAACACCACGACTTTGCAACAACACCTCATTGAT 
1 1 1 1 1 1 1 I 1 1 1 1 1 (- 980 

CAACCAGGTTAGTGATGTACCACATAAGTTCCAGTTCTTGTGGTGCTGAAACGTTGTTGTGGAGTAACTA 

TEV - Nla 



Leu Val Gin Ser Leu His Gly Vai Phe Lys Val Lys Asn Thr Thr Thr Leu Gin Gin His Leu He Asp 
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— z 

JZ CO 

co O 



GGGAGGGACATGATAATTATTCGCATGCCTAAGGATTTCCCACCATTTCCTCAAAAGCTGAAATTTAGAG 

~ 1 1 1 1 1 1 1 1 ' 1 « ( 1 h 1050 

CCCTCCCTGTACTATTAATAAGCGTACGGATTCCTAAAGGGTGGTAAAGGAGTTTTCGA CTTTAAATCTC 

TEV-Nla 

G!y Arg Asp Met lie He He Arg Met Pro Lys Asp Phe Pro Pro Phe Pro Gin Lys Leu Lys Phe Arg 




AGCCACAAAGGGAAGAGCGCATATGTCTTGTGACAACCAACTTCCAAACTAAGAGCATGTCTAGCATGGT 

Z ' 1 1 1 ' 1 ' * 1 1 1 i 1 f- 1 120 

TCGGTGTTTCCCTTCTCGCGTATACAGAACACTGTTGGTTGAAGGTTTGATTCTCGTACAGATCGTACCA 

TEV-Nla 

GIu Pro Gin Arg GIu Glu Arg He Cys Leu Vol Thr Thr Asn Phe Gin Thr Lys Ser Met Ser Ser Met Vol 
<o 

CO 

GTC AG AC AC T AG TTGC AC AT TC CC TTC A TCTGATGGC AT A TTCTGGAAGCATTGG A TTC A A AC C A AGGAT 
■ i 1 1 1 1 1 1 1 1 — i 1 1 h 1 190 

CAGTCTGTGATCAACGTGTAAGGGAAGTAGACTACCGTATAAGACCTTCGTAACCTAAGTTTGGTTCCTA 

TEV-Nla 

Ser Asp Thr Ser Cys Thr Phe Pro Ser Ser Asp Gly He Phe Trp Lys His Trp He Gin Thr Lys Asp 



o 

-5 <Q of- ? 

o c oco 2 

«J- CO <CQ CO 



0 < CO 

11/ 



GGGCAGTGTGGCAGTCCATTAGTATCAACTAGAGATGGGTTCATTGTTGGTATACACTCAGCATCGAATT 
1 1 1 1 1 1 1 1 p 1 1 1 , h 1260 

CCCGTCACACCGTCAGGTAATCATAGTTGATCTCTACCCAAGTAACAACCATATGTGAGTCGTAGCTTAA 

TEV - Nla " 



Gly Gin Cys Gly Ser Pro Leu Vol Ser Thr Arg Asp Gly Phe He Vol Gly lie His Ser Ala Ser Asn 
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Q 
c 



TCACCAACACAAACAATTATTTCACAAGCGTGCCGAAAAACTTCATGGAATTGTTGACAAATCAGGAGGC 
1 ! 1 1 1 1 1 1 1 [ 1 ( , (_ 1330 

AGTGGTTGTGTTTGTTAATAAAGTGTTCGCACGGCTTTTTGAAGTACCTTAACAACTGTTTAGTCC TCCG 



TEV - NIa 



Phe Thr Asn Thr Asn Asn Tyr Phe Thr Ser Vol Pro Lys Asn Phe Met GIu Leu Leu Thr Asn Gin Glu Ala 

in 
o 

GCAGCAGTGGGTTAGTGGTTGGCGATTAAATGCTGACTCAGTATTGTGGGGGGGCCATAAAGTTTTCATG 

— 1 1 1 H 1 1 1 1 1 1 1 j 1 1- moo 

CGTCGTCACCCAATCACCAACCGCTAATTTACGACTGAGTCATAACACCCCCCCGGTATTTCAAAAGTAC 



TEV - NIa 



Gin Gin trp Vol Ser Gly Trp Arg Leu Asn Ala Asp Ser Val Leu Trp Giy Gly His Lys Val Phe Met 



— in 

*- o 

V3 O 

CD UI 



a. m 

co to 
OCC 



AGCAAACCTGAAGAGCCTTTTCAGCCAGTTAAGGAAGCGACTCAACTCATGAGTGAATTGGTGTACTCGC 



TCGTTTGGACTTCTCGGAAAAGTCGGTCAATTCCTTCGCTGAGTTGAGTACTCACTTAACCACATGAGCG 



TEV - NIa 



i 



Cleavage 



Ser Lys Pro Glu Giu Pro Phe Gin Pro Val Lys Glu Ala Thr Gin Leu Met Ser Giu Leu Val Tyr Ser 



~— cr 

2 -3 o 

O .5 O 

~:co lu 



v 



AAGGGAGGCCTTGAATTC 
1 [ ■ . i »► 

TTCCCTCCGG AACTTAAG 
Seq. | 



1488 



Gin Gly Arg Pro • 
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